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ABSTRACT 

The cellular phenotype is described by a complex network 
of molecular interactions. Elucidating network properties that 
distinguish disease from the healthy cellular state is therefore of 
critical importance for gaining systems-level insights into disease 
mechanisms and ultimately for developing improved therapies. 
Recently, several statistical mechanical network properties have been 
studied in the context of cancer interaction networks, yet it is unclear 
which network properties best characterise the cancer phenotype. 
In this work we take a step in this direction by comparing two 
different types of molecular entropy in their ability to discriminate 
cancer from the normal phenotype. One entropy measure, called 
flux entropy is dynamical in the sense that it is derived from 
a stochastic process satisfying an approximate diffusion equation 
over the cellular interaction network. The second measure, called 
covariance entropy, does not depend on the interaction network 
and is thus of a static nature. Using multiple gene expression data 
sets of normal and cancer tissue, encompassing approximately 500 
samples, we demonstrate that flux entropy is a better discriminator 
of the cancer phenotype than covariance entropy. Specifically, we 
show that local flux entropy is always increased in cancer relative to 
normal tissue while the local covariance entropy is not. Furthermore, 
we show that gene expression differences between normal and 
cancer tissue are anticorrelated with local flux entropy changes, thus 
providing a systemic link between gene expression changes at the 
nodes and their local information flux dynamics. Finally, we show that 
genes located in the intracellular domain demonstrate preferential 
increases in flux entropy, while the dynamical entropy of genes 
encoding membrane receptors and secreted factors is preferential 
reduced. Thus, these results elucidate intrinsic network properties of 
cancer and support the view that the observed increased robustness 
of cancer cells to perturbation and therapy may be due to an increase 
in the dynamical network entropy that allows the cells to adapt to the 
new cellular stresses. Thus, using local flux entropy measures may 
also help identify novel drug targets which render cancer cells more 
susceptible to therapeutic intervention. 



*to whom correspondence should be addressed 



INTRODUCTION 

The characterisation of cancer cells in terms of genetic (and 
epigenetic) aberrations has advanced our understanding of cancer 
biology, yet far fewer insights have been gained into systems- 
level properties that define the cancer cell phenotype. Since the 
normal physiological state of a cell is described by a complex 
interaction network, it makes sense to attempt identify systems- 
level properties of cancer by elucidating network properties that 
differentiate cancer from normal tissue. In line with this, the 
notion of "differential networks" has emerged, which attempts to 
better characterise dise ase phenotyp e s by s tudying the changes 
in interaction patterns l|T avlor et qui l2009l: iHudso n et a/.b [2009I: 
Teschen dorff and Severin i bl20ld : lBandvopadhvav et aZ.bl2010l : ICalifanc 
"T l2012h . 



201 lb lldeker and Kroganb 120121 as opposed to merely analysing 
the changes in mean levels of some molecular quantity (e.g gene 
expression). As demonstrated by these studies dTavlor et a/J . T2009b 
Hudson et cil., 2009; Teschendorff and Severini, l2010h . differential 
networks can identify important gene modules implicated in cancer 
and also provide critical novel biological insights not obtainable 
using other approaches. This differential network strategy has 
recently received further impetus from studies of differential 
epistasis mapping in yeast, demonstrating that differential 
interactions may hold the key to understanding the systems-level 
responses of cells to exogenous an d endogenous perturbations 
including those present in cancer cells jBandvopadhvav et a/.bl201(it 
ICalifanob [2oTlh . However, from a systems-level perspective it is 
still very unclear what network properties best define the cancer 
cell phenotype. We propose that a better characterisation of these 
network properties is important, not only for a deeper understanding 
of cancer systems biology, but also for identifying novel drug targets 
and realizing the promise of personalized medicine. 
One popular and fruitful way to probe the changes in molecular 
interactions underpinning a genetic disease like cancer has been 
to integrate mRNA gene expression data of cancer and normal 
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been performed at the level of proteins whereby gene expression 
values are overlay e d onto the nodes of t he ne twork (see e.g 
lUlitskv and Shamir! d2007h : IChuang et all d2007l) ). and at the 
level of protein-interactions whereby weights are assigned to the 
edges accor d ing to the expression correlation strength (see e.g 
iTavlor etal. Recently, several studies have started to 

investigate the statistica l prop e rties of these integrated w e ighted 
networks i Taylo r et all 20091: Tfeschendor ff and SeveriniL l20ld : 



ISchramm et fl/.L^Old l lKoirrurovand^airllToiol) . In fact, from the 
correlations in gene expression a natural random walk process on 
these networks can be defined by a stochastic "flux" matrix, pij, 
which reflects the probability of diffusion along any given edge 
i — > j in the network. From this local stochastic matrix one may 
then define for each gene (i.e node i) in the network a local flux 
entropy, 



Si ix — 22 log 

i&V(i) 



(1) 



where N(i) denotes the neighb ors of gene i in the network 
dTeschendorff and Severinil l20ld) . We previously showed that 
primary tumours that metastasize exhibit an increase in this 
local entropy compared to prim ary tumours that do not spread 
dTeschendorff and Severinil l2010h . Moreover, we showed that the 
increases in local flux entropy affected many genes in known 
tumour suppressor pathways, supporting the view that this increased 
dynamical disorder is caused by the higher frequency of genomic 
alterations underlying the metastatic phenotype. 
The purpose of the present study is three-fold. First, to extend 
our previous investigation by exploring the chances in flux entropy 
between normal and cancer tissue. Second, to extend the notion of 
local flux entropy to a non-local/global one, i.e for subnetworks. 
Third, to determine if the observed changes in flux entropy in cancer 
are dependent on the underlying interaction network and network 
dynamics. 

To address the first question, we collected and analysed the 
largest available gene expression data sets encompassing relatively 
large numbers of both normal and cancer tissue, thus allowing 
for an objective comparison between phenotypes. To address the 
second problem we consider a diffusion process over the graph 
dBarrat et all l2008h and define a non-local flux entropy from a 
stochastic matrix that satisfies an approximate diffusion equation 
over the network. This construction is therefore closely related to 
the h e at kern el PageRank algorithm dChund. 120071 : iBrin and Paael 
1 19981 ; iBarrat et all l2008h . To address the third question, we 
consider a different type of molecular entropy, called covariance 
entropy, which merely reflects the similarity o f gene expression 
profiles dvan Wieringen and van der VaarAl201lh . In fact, while flux 
entropy is derived from a stochastic matrix, the covariance entropy 
is defined from the symmetric covariance matrix and therefore 
lacks a dynamical interpretation in terms of diffusion or random 
walks. Hence, by comparing these different types of molecular 
entropy, we can study how relevant the network dynamics is for 
the characterisation of the cancer phenotype. 



METHODS 

The protein interaction network (PIN) 

We downloaded the complete human protein interaction network 
from Pathway Comm ons {www.pathwaycommons.org) (Jan.2011) 
dCerami et ^l l201lh . which brings together protein interactions 
from several distinct sources. We then built a reduced protein 
interaction network from integrating the follow ing sources: 
the H uman Protein Reference Database (HPRD) dPrasad et all 
120091) . the National Cancer Institute Nature Pathway Interaction 
Database (NCI-PID) (pid.nci.nih.gov), the Interactome (Intact) 
http://www.ebi.ac.uk/intact/aiid the Molecular Interaction Database 
(MINT) http://nxint.bio.uniromo2.it/mint/. Protein interactions in 
this network include physical stable interactions such as those 
defining protein complexes, as well as transient interactions 
such as post-translational modifications and enzymatic reactions 
found in signal transduction pathways, including 20 highly 
curated immune a nd cancer signaling pa thways from NetPath 
(www.netpath.org) dKandasamv et all |2010|) . We focused on non- 
redundant interactions, only included nodes with an Entrez gene ID 
annotation and focused on the maximally conntected component, 
resulting in a connected network of 10,720 nodes (unique Entrez 
IDs) and 152,889 documented interactions. In what follows we refer 
to this network as the "PIN". 

Normal and cancer tissue gene expression data sets 

We searched Oncomine dRhodes et a/1 |2004) for studies which 
(i) had profiled reasonable numbers of cancer and normal tissue 
samples (at least ~ 25 of each type), and (ii) which had been 
profiled on an Affymetrix platform. In order to reliably estimate 
covariance of two genes across a set of samples, at least ~ 
25 samples are needed. The second criterion reflects the desire 
to conduct the study on a common platform and Affymetrix 
arrays are the most widely used. Using the same platform 
across studies ensured that the integrated mRNA-PIN networks 
were of similar size. In all cases, the intra-array normalised 
data was downloaded from GEO (www.ncbi.nlm.nih.gov/geo/), 
quantile normalized, and subsequently probes mapping to the 
same Entrez gene ID were averaged. We then subjected each 
study that passed these criteria through a quality control step, 
which involved a Principal Component Analysis (PCA) to check 
that (iii) the dominant component of variation correlated with 
cancer/normal status. If not, this indicated to us a more pronounced 
source of non-biological variation, which would confound our 
downstream analysis. There were six studies satisfying all three 
criteria and the tissues profiled included bl adder (48 normals 
and 81 cancers) dSanchez-Carbavo et ali 1200^) . lung (49 normals 
and 58 canc ers) dL*andTT Ta/Tl2*0*08h . gastric (31 normals and 
38 cance rs) 1 D'Errico et all l2009h . pancreas (39 normals and 
cancers) 1 Badea et all 2008n . cervix (24 normals and 33 cancers 
dScotto et all l2008h a nd liver (23 normals and 35 cancers) 
dWurmbach et all\2Wyh . 



Integrated PIN-mRNA expression networks and the 
stochastic information flux matrix 

For a given cellular phenotype (i.e. cancer or normal), we build 
an integrated mRNA-PIN using th e same procedure as described in 
dTeschendorff and Severin 3. 120101) . Briefly, edge weights in the PIN 
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were defined by a stochastic matrix p; 



Pij = 



EfcgAf(i) Wik 



(2) 



stochastic matrix dChund.l2007h and satisfies 

V ~ K(t) 



d t K(t) = -K(t)(I-p) + 



e* - 1 



(5) 



with YsjeMU) P ij = ^' wnere denotes the neighbors of gene 
i in the PIN and where Wij = |(1 + Cij) denotes the transformed 
Pearson correlation coefficient Cij of gene expression between 
genes i and j across the samples belonging to the given phenotype. 
This definition of Wij reflects our desire to treat correlations and 
anti-correlations differently. We also note that we enforce pij = 
whenever (i, j) is not an edge in the PIN. Thus, the integrated 
mRNA-PINs with the edge weights as defined by pij , can be viewed 
as approximate models of signal transduction flow (as measured 
by positive gene-gene correlations in expression) subject to the 
structural constraint of the PIN. Applying this procedure to the two 
phenotypes yields two integrated PIN-mRNA networks, one for 



where we have suppressed matrix indices and where / denotes the 
identity matrix. Since pij,Kij(t) < 1 Wi,j,t, it follows that 
for sufficiently large temperatures (t > 1), K(t) a pproximates a 
solution of the heat-diffusion equation l lChunaLl2007h 



d t K{t) » -K(t)(I-p) 



(6) 



Thus, the choice a — t L /L\ leads to a natural interpret ation in terms 
of a d iscrete approximate diffusion process on a graph jBarrat et all 
12008b . 

The information flux entropy 

Given the matrix Kij, let Q denote the number of non-zero Kij, i.e 
the cancer phenotype with stochastic matrix p[f\ and one for the Q = E 4j I ( K ij > 0) where I is here the indicator function. We 



normal phenotype with stochastic matrix Py . 
It is important to stress that we have approximated signal 
transduction flux on the PIN by positive correlations in expression 
between interacting genes. This is obviously a crude approximation 
and therefore a limitation of this study, however, until other types 
of matched molecular data (e.g protein expression, phosphorylation 
and other post-translational modification states) become available 
on a genome-wide basis, we are restricted to the use of only 
gene expression data. Nevertheless, some important rationale 
and justification for the use of gene expression correlations to 
approximate signaling flux over the network will be provided by 
careful comparison of the local correlations to those which are 
non-local. 

A heat kernel stochastic matrix 

It is clear that the stochastic matrix py above defines a (biased) 
random walk on the network M. One may thus compute an 
infor mation (or probability) flux between any two nodes i and j 
in J\f jEstrada and Rodriguez-Velazquea . l2005h . In fact, it is clear 
that the probability flux of moving from i to j over a path of length 
L is given by {p L )ij- It follows that the total probability flux Eij 



between i and j is given by 



7^a L (p L ) 1 
i=i 



(3) 



where 7 is a normalisation factor and where we have introduced a 
set of arbitrary weights oil, to allow variable contributions for paths 
of different lengths. One possibility is to suppress paths of longer 
lengths using at = 1/L!, which also guarantees converg ence of the 
infinite series jEstrada and Rodriguez-Velazquezl . [20051) . Formally, 
defining ol = t L / L\, we obtain the stochastic matrix 



K iS (t) 



(4) 



where we have introduced a "temperature" parameter t (Ch una, 
120071) . This stochastic matrix is a modified version of the heat-kernel 



then define an information flux entropy as 



SAT(t) = - 



1 



logQ 



(t) log Kij (t) 



(7) 



where we have rescaled Kij(t) by 1/n in order to ensure that 
Eij Kij(t) = 1- Note that the information flux entropy defined 
above can be thought of as a non-equilibrium entropy, since 
the stationary distribution m of Kij, defined by iiiKij = ■Kj, 
was not included. Our choice to consider this non-equilibrium 
version is motivated by our desire to objectively compare the flux 
entropy to the covariance entropy, which does not have a stationary 
distribution, as explained in the next subsection. 
Suppose now that we consider diffusion/flux over paths of 
maximum length 1. Then, this leads to Kij = Pij/n where n is 
the number of nodes in N (we have set t = 1 for convenience). 
This leads to the expression 



5' 



1) 

AT 



logQ n 



y (J n ^ — ^ 



logQ n 

In the above expression, Si is the local flux entro py of node i 
jBarrat et~a l. 2008; Teschendorff and Severini, 2 010h . 



S l = 



log k t 



Pij log Pr, 



(8) 



where ki is the degree of node i and the normalisation factor ensures 
that the maximum attainable entropy is equal to 1, independent of 
the degree of the node. 

Next, we can consider flux over paths up to length two, in which 
case 

,(2) _ Pij + gAP )ij 



K 



and the corresponding entropy, 



c(2) _ 
J\f - 



logQ 



(2) 



(9) 



(10) 
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In principle, we can estimate the flux entropy S^ h ' for paths of 
arbitrary order ft. In this case, 



K 



(h) 



En, 



r— 1 r\ \ r=l / 



(ID 



In this work we compute flux entropies up to moments of order 5 
using the R-package expm. Not going beyond ft = 5 is justified 
for two reasons: (i) the most interesting behaviour is found for 
h < 3, (ii) the computational cost for ft = 5 is considerable, 
for instance, estimation of flux entropy and associated sampling 
variance estimates for a typical data set of 30 samples and ~ 7500 
nodes at ft = 5 takes at least ~ 20 hours on a high-performance 
quad processor workstation. 

The covariance entropy 

In addition to the information flux entropy, we also consider a 
different type of entropy which merely quantifies the degree of 
similarity of gene expression profiles (as determined by Pearson 
correlations). Given a set of p genes (i.e. the vertices of our PIN or 
a subgraph thereof) with mean expression vector n — (fii, n P ) 
and p x p covariance matrix E, both computed over the samples 
within a given phenotype, its covarianc e entropy, 5* s , is given by 
( Ivan Wieringen and van der VaartLfeoi lh 

5 s = ilogdetE + ip(l + log2^) 



= i^logA l + ip(l + log27r) 



assuming multivariate normality of the expression matrix. Since 
typically p > n a (n a ^number of samples in the given phenotype), 
the eigenvalues \i of the co variance matrix need to be est imated 
using a shrinkage estimator dSchaefer and Strim mer. 2005). This 
approximation for the entropy w as shown to be in good agreement 
with non-parametric estimators dvan Wieringen and van der VaartL 
1201 lh . When estimating the covariance matrix we would allow 
for any pairwise gene covariances, which therefore does not take 
the network structure into account. To obtain local estimates of 
covariance entropy which do take the local network neighborhood 
into account, we compute a covariance entropy for each node i E M 
as, 

Sf = ilogdetE iuJVW + i(fci + 1)(1 + log27r) (12) 

where E iuAf (i) is the covariance matrix over the subgraph iU M{i), 
i.e. the subgraph made up of node i and its neighbours N(i). In this 
case, a shrunken estimate of the covariance matrix is only needed 
for nodes with degrees ki > n s — 1, Using a different estimator 
for the covariance matrix depending on the degree of the nodes is 
justified since we are interested in making comparisons between the 
normal and cancer networks and the node degrees are unchanged 
between the two phenotype s. 

Since we are interested in studying the changes in correlative 
patterns between phenotypes we estimate the covariance entropies 
from a rescaled expression matrix where each feature (gene) has a 
unit variance over the specific phenotype. This then guarantees that 



E« = 1 Vi = 1 , . . . , p. Under these constraints and for fixed p, the 
maximum possible covariance entropy corresponds to the case when 
E = /, i.e when the covariance matrix is the identity matrix. The 
maximum covariance entropy value is then \p{l + log 2ir). Thus, 
to ensure that the maximum value is independent of p we divide 
the above definition of local covariance entropy by the maximum 
attainable value, so that Sf < 1 Vi. 

Sampling variance using the jackknife 

To estimate the statistical significance of observed differences in 
entropy bet ween two p henotypes, we decided to use the jackknife 
procedure dWul Il986l) . Briefly, the jackknife procedure removes 
one sample at a time from the given phenotype and recomputes the 
desired quantity 5* (here entropy). Thus, if there are n samples in 
the given phenotype one obtains n jackknife estimates (Sj,j : j = 
1, n). A jackknife estimate for the mean 5 M and variance Sv of 
S is then obtained as 

n-1 



Sv = 



3=1 



where S is the estimate using all n samples and (Sjj)j = 
— 5^7=1 ^ J >i' Thus, for two phenotypes "N" and "C", we compute 



the difference AS, 



J 



6(C) 

J U 



J(JV) 



and obtain a z-statistic 



ASj 



(13) 



where a j = sj S { y } + . 

This jackknife procedure can be applied to both flux and covariance 
entropy defined over the network or for each node. Note that in the 
case where we obtain z-statistics for each gene/node, the genes can 
then be ranked according to the significance of this z-statistic. 
It should be pointed out that although bootstrapping provides 
an alternative to the jackknife, that it is not appropriate here 
since the res ampling with replacement woul d artific ally inflate 
correlations dvan Wieringen an d van de r VaariL 1201 lh. Anothe r 
procedure, adopted in dvan Wieringen and van der Vaartl 1201 lh . 
could be to permute the phenotype labels, so that a given "permuted" 
phenotype contains now a mixture of "normals" and "cancers". 
However, because there are massive differences in expression 
between normal and cancer, this procedure would dramatically alter 
the distribution of correlations within the new permuted phenotypes 
which would also not yield the correct null distribution. Thus, the 
jackknife strategy circumvents this difficulty while also avoiding the 
bias associated with bootstrapping. 



RESULTS 

We identified six expression data sets encompassing sufficient 
numbers of normal and cancer tissue samples and which passed 
our quality control criteria (Methods). The tissues profiled were 
bladder, lung, stomach, pancreas, cervix and liver. Integration of 
these expression data sets with our protein interaction network (PIN) 
(Methods) yielded sparse weighted networks of approximately 7500 
nodes and 98500 edges. The average degree, median degree and 
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Fig. 1. A) Boxplots of local (i.e. per node) flux entropies (y-axis,FluxS) in cancer (C) and normal (N) tissue for all nodes with degree > 10 (~ 3500 nodes) 
and across the six different tissue types. P-values are from a one-tailed unpaired Wilcoxon rank sum test. B) As A) but for the local covariance entropies 
(CovS). Both flux and covariance entropies have been normalised so that the maximum attainable value is 1. 



diameter of these integrated networks were approximately 26, 8 and 
12, respectively. An important assumption underlying any analysis 
on these integrated networks is that genes which are neighbors in 
the network are more likely to be correlated at the level of gene 
expression. While t his has been shown for specific data sets (see e.g 
iTavlor et all J2009I) ). we verified that it also holds for the integrated 
mRNA-PIN networks considered here (Fig.Sl). 

Increased information flux entropy as an intrinsic 
property of the cancer cell phenotype 

We previously showed that primary breast cancers that metastasize 
exhibit an i ncreased flux entropy compared to breast cancers that do 
not spread l lTeschendorff and Severinil l2010b - Comparing distinct 
cancer phenotypes to each other has the advantage of having access 
to larger sample sizes, thus allowing for more reliable estimates of 
expression correlations. Nevertheless we here sought to determine 
if flux entropy also discriminates cancer from its respective normal 
tissue phenotype. A comparison of the distributions of local flux 
entropies between normal and cancer across six different tissue 
types confirmed that cancer is indeed characterised by an increased 
flux entropy (Fig. 1A). 

To investigate if the specific network dynamics is important 
in characterising the cancer phenotype, we computed a local 
covariance entropy (Methods). While the local covariance entropy 
also takes the neighborhood of each node into account, it is 
derived from a covariance matrix and therefore does not admit 
a dynamical interpretation. Interestingly, and in contrast to flux 
entropy, local covariance entropies were not always significantly 
higher in cancer (Fig. IB). In fact, in 3 out of 6 tissues, covariance 
entropies were lower in cancer (Fig. IB). The statistics of differential 



entropy between cancer and normal, derived from an unpaired non- 
parametric test, were also higher for local flux entropy than local 
covariance entropy (Table 1). The higher discriminatory power of 
flux entropy was retained when a paired non-parametric test was 
used to account for potential dependencies between the normal and 
cancer entropies at each node (Table 1). 

Thus, these results indicate that the specific network dynamics 
considered is of relevance for characterising the cancer phenotype. 
Indeed, only flux entropy provided a consistent discriminator 
between the normal and cancer phenotypes, and moreover, the 
power of discrimination was also consistently higher for flux 
entropy. Thus, the increased dynamical disorder appears to be an 
intrinsic property of the cancer cell phenotype. 

Higher order non-local flux entropy 

Given that local flux entropy can discriminate the cancer and normal 
phenotypes, it is natural to ask if higher order flux entropies, 
computed over paths of length larger than 1, are also discriminatory. 
To this end, we computed for the normal and cancer phenotypes, a 
higher-order flux entropy 

S#oc-£tfg>logtf<?> (14) 

ij 

( 2) 

where K\- satisfies an approximate diffusion equation over the 
network (Methods). We point out that even when i and j are 

(21 

neighbors, that K\- is not equal to pij, since we allow for 
alternative signaling paths (of maximum length 2) between genes i 
and j. Thus, this flux entropy also takes the well-kno wn redundancy 
of signaling paths into account dTieri et all l201fj) . We observed 
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Table 1. Wilcoxon rank sum test statistics comparing the flux entropies 
(FluxS) and covariance entropies (CovS) between normal and cancer, 
and across the six tissue types. We provide statistics for both the unpaired 
and paired (i.e treating the cancer and normal entropies for each gene 
as dependent variables) version of Wilcoxon rank sum tests. The test- 
statistics are one-tailed (hypothesis is that cancer has higher entropy) 
and have been normalised to lie between and 1. Values close to 0.5 
and less than 0.5 mean no discrimination and higher entropy in normal 
phenotype, respectively, while values closer to 1 indicate significantly 
higher entropy in cancer. 
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PAIRED 

FluxS 0.75 0.92 0.69 0.97 0.88 0.88 

CovS 0.57 0.86 0.56 0.99 0.75 0.49 



Table 2. Relation between differential expression and differential 
flux entropy (FluxS), and between differential expression and 
differential covariance entropy (CovS). The odds ratio (OR) reflects 
the odds of a gene overexpressed in cancer showing reduced 
(flux/covariance) entropy in cancer, compared to a gene that is 
underexpressed. The P-value (P) reflects the statistical significance 
of the odds ratio. 
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3.07 


2.43 


2.17 


3.64 
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3e-9 


0.04 


0.05 


0.03 


0.02 


0.005 


CovS 
OR 
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1.24 
0.09 


0.73 
0.91 


2.49 
6e-6 


< 0.01 
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0.85 
0.76 


3.70 
2e-12 



Higher order flux entropy difference 
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Fig. 2. z-statistics of differential non-local flux entropy (x-axis) for the 
six different tissues (y-axis). The flux entropy considered here is the 
measure which is defined for a stochastic diffusion matrix for maximum path 
lengths of order 2 (Methods). Positive z-statistics means higher entropy in 
cancer compared to normal. Green lines indicate the 95% confidence interval 
envelope and given P-values are from a normal null distribution centred at 



a higher flux entropy in cancer compared to normal tissue across 
all tissue types, although this was only statistically significant for 
the four larger studies (Fig.2). We also computed higher order 
entropies up to paths of maximum length 5. However, as with 
S^ 2 \ higher order flux entropies S^ k \ k > 3 generally exhibited 
reduced discriminatory power (data not shown), suggesting that 
the interesting dynamical changes associated with flux entropy in 
cancer are localised to neighbors and nearest neighbors in the 
interaction network. 

Relation between differential expression and differential 
entropy 

Our underlying biological hypothesis is that genes which exhibit 
an increase in local flux entropy do so because they become 
inactivated in cancer. Thus, one would expect tumour suppressor 
genes to show preferential increases in flux entropy. Conversely, 



one would expect genes that become activated in cancer (e.g 
oncogenes) to exhibit preferential reductions in entropy since the 
activation is likely to lead to the subsequent activation of an 
associated signalling pathway, possibly mediated by one of the 
neighbors of the oncogene. While the activation/inactivation of 
oncogenes/tumour suppressors is caused by underlying genetic 
and epigenetic alterations, the specific alteration patterns are not 
available for the tumours considered here. However, since the 
effects of these alterations are mediated by the corresponding 
changes in gene expression we can directly test the hypothesis 
in relation to the directional changes in gene expression between 
normal and cancer tissu e. Thu s , for each gene we computed 
a regularized t-statistic dSmvthL l2004l) that reflects the degree 
of differential expression between normal and cancer tissue. 
Similarly, for each gene we used jackknife estimates to derive a z- 
statistic that reflects the degree of differential entropy between the 
normal and cancer phenotype (Methods). Next, we selected those 
genes with significant changes in both differential expression and 
differential flux entropy (P < 0.05). Confirming our hypothesis, 
we observed that genes significantly overexpressed in cancer 
showed preferential reductions in flux entropy compared to genes 
which were underexpressed, and the associated odds ratios were 
statistically significant across all 6 tissue types (Table 2). In contrast, 
this anticorrelation was only observed in 2 of the 6 tissues when the 
covariance entropy was considered (Table 2), further supporting the 
view that the flux dynamics defined on the networks is meaningful 
and that changes in the local flux dynamics around nodes reflect the 
underlying changes in gene expression at these nodes. 

Differential local entropy correlates with the signaling 
domain of genes 

It is of interest to explore the pattern of differential local entropy 
in relation to the topological properties and location of the 
genes in the interaction network. Dividing up the genes in the 
network into date-hubs, party-hubs (hub-bottlenecks), nonhub 
bottlenecks and nonhub-nonbot tlenecks using a definition similar 
to that used in dYu et al.[ 1200% showed that hubs exhibited only 
marginally larger changes in flux entropy (Wilcoxon rank sum 
test P — 0.04). However, by dividing up the genes in the 
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Fig. 3. The statistics of differential local flux entropy change (y-axis) 
against the main signaling domain (ECiextracellular/membrane receptor, 
IC:intracellular/nuclear) in the bladder cancer set. The P-value is from a 
two-tailed Wilcoxon rank sum test. 



Table 3. Enrichment analysis of cell differentiation (Cell-Diff.) markers 
and cell-cycle genes among the top 10% ranked genes exhibiting entropy 
increases (ON) and decreases (N>C). The enrichment odds ratio (OR) and 
P-value (P) is from a one-tailed Fisher exact test. NA=not available due to 
insufficient number among the top 10%. 
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network into the two major extracellular/m embrane (EC) and 
intrac ellular/nuclear (IC) signaling domains jKomurov and Rani 
l20ld) . we observed a significantly different pattern of differential 
local entropy, with genes mapping to the intracellular domain 
demonstrating preferential increases in flux entropy, while gene 
encoding membrane receptors and secreted factors exhibited 
preferential decreases (Fig. 3). However, we note that some genes 
in the intracellular domain also exhibited marked reductions in flux 
entropy. 



Differential flux entropy captures dynamical changes in 
the cell-cycle and not stromal variations 

The observed dependence of differential entropy patterns on the 
signaling domain supports the view that most of these patterns 
are due to signaling changes in the epithelial tumour cells, and 
we posited that the small number of intra-cellular genes showing 
significant entropy decreases (Fig. 3) could reflect the increased 
activity of the cell-cycle in cancer. We thus performed a gene 
set enrichm ent analysis using t he Mo lecular Signatures Database 
(MSigDB, dSubramanian et al{ 120051) ) on the top ranked genes, 
ranked according to the magnitude of differential flux entropy 
(separately for increased and reduced entropy). In doing so, we also 
sought to determine the role, if any, that changes in immune and 
stromal cell composition could have. To this end we constructed 
a set of well known cell differentiation and surface markers from 
MSigDB. Confirming our hypothesis, genes implicated in the cell- 
cycle showed statistically significant reductions in flux entropy 
across most of the studies (Table 3). In contrast, cell differentiation 
markers were generally not enriched and only showed a more 
marginal trend towards lower flux entropy in cancer, also consistent 
with cancers exhibiting a somewhat higher level of immune/stromal 
cell infiltration. Together, these results indicate that flux entropic 
changes are capturing mostly changes in the intracellular wiring of 
the tumour epithelial cells. 



DISCUSSION 

In this study we have performed a detailed comparison of molecular 
entropy metrics in cancer and have in the process elucidated a 
system-omic principle underlying cancer. By defining a dynamics 
of information flux on a network of protein interactions, we have 
shown that entropy associated with this dynamics is increased in 
the cancer phenotype. Importantly, in the absence of this network 
dynamics, the molecular entropy does no longer provide a consistent 
discriminator of the cancer phenotype. This is an important insight, 
because it shows that the specific network dynamics considered has 
a biological meaning of relevance to cancer. 

It is of importance to discuss (i) what may cause cancer cells 
to exhibit this increase in dynamical entropy and (ii) what it 
may mean for the cancer phenotype itself. Concerning the first 
question, one would expect genes that become inactivated in 
cancer to represent foci of increased flux entropy since the 
inactivation compromises its biological function: at the level of 
mRNA expression this would manifest itself as reduced expression 
correlations with its interacting neighbors, but more generally as an 
increased uncertainty as to which neighbors it may interact with. 
Conversely, for a gene that is overactivated in cancer its biological 
function is enhanced leading to an increased flux of the associated 
oncogenic pathway. In terms of local flux entropy this increased 
flux along a particular path in the network corresponds to a reduced 
uncertainty along which path the information is transferred. In 
line with these biological expectations we did observe that genes 
overexpressed in cancer were significantly more likely to exhibit 
reductions in flux entropy than underexpressed genes. Interestingly, 
this anticorrelation between local differential expression and 
entropy was more consistent (across studies) for flux entropy than 
covariance entropy, further suggesting that the network dynamics is 
biologically meaningful and that changes in this dynamics reflect 
the underlying changes in gene expression. It will be interesting 
to test these hypotheses further using multidimensional cancer 
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genomic profiles that encompass mutational, DNA methylation, 
copy-nu mber an d mRN A expression profiles for the same tumours 
(see e.g ITCGAI feOllK although an objective comparison will 
require equal numbers of nor mal tissue samp les, which for most 
cancers is not yet available dTCGAL 1201 lh . However, already 
supporting the biological and clinical relevance of our flux entropy 
measure, we indeed observed that many of the genes exhibiting 
the largest reductions in entropy were either known or candidate 
oncogenes. For instance we observed AURKB to be the most highly 
ranked gene in bladder cancer. Given the we ll establish e d onco genic 
role of AURKA in bladder cancer (see e.g IPark et al. (2008)), our 
analysis therefore suggests that the closely related kinase, AURKB, 
which has already been i mplicated as an oncogene and potential 
drug t arget in other cancers i Lens et aZ[l2010l : fLucena-Arauio et ali 
1201 ll ; iMorozova et all [2010) 7 may also play an equally important 
role in the pathogenesis of bladder cancer. 

It will also be interesting to explore the changes in the network 
dynamics, including flu x entropy, in th e context of the different 
types of network motifs dCui et a/ll2007h . At this stage we can only 
speculate that the observed global increase in flux entropy reflects a 
higher frequency of inactivating over activating alterations in cancer. 
Intuitively, one would expect cancer cells to be characterised by 
many more inactivating changes since a random mutation/alteration 
is more likely to inactivate than activate a gene, and indeed this 
would be in agreement with recent reports suggesting that most 
genetic alterations are inactivating and affect tumour suppressors 
( TWood et al l l200l) . Supporting this further, we have seen that the 
preferential increase in flux entropy was observed mainly for genes 
in the intracellular domain, consistent with the evidence that most of 
the gene tic and epigenetic a lterations affect genes in this signaling 
domain dForbes et q/.Ll201ll) . 

Concerning the second question posed above, we propose that 
the increased flux entropy in cancer may endow cancer cells 
with the flexibility to adapt to the strong selective pressures 
of the tumour microenvironment. Moreover, the increased flux 
entropy could underpin the intrinsic robustness of cancer cells to 
endogenous and exogeneous perturbations, including therapeutic 
interventio n. Indeed, a general fluctu ation theorem from statistical 
mechanics dManke et al ., 2005, 200(3) states that changes in network 
(topological) entropy, AS, and robustness, AR, are correlated, i.e 



ASAR > 



(15) 



Thus, according to this theorem if a node associated with high 
network entropy is removed, i.e if AS < 0, then AR < 0, meaning 
that the network is less robust as a result of this perturbation. In 
the context of our dynamical entropy and in comparing normal 
to cancer tissue, we are interested in those genes which show the 
largest changes in differential flux entropy. Thus, cancer alterations 
that lead to significant increases in flux entropy may contribute 
to the dynamical robustness of these cancer cells, whilst those 
alterations that are associated with reductions in flux entropy may 
make these cells less dynamically robust. Interestingly, this would 
fit in well with one of the important cancer hallmarks, namely, 
that of oncogene addiction, whereby can cer cells become overly 
relian t on a specific oncogenic pathway dHanahan and Weinberg . 
1201 lh . In the case that the activated oncogene is druggable (e.g 
ERBB2 in breast cancer), targeting of this oncogene has proved to b e 
an effective drug therapy strategy dHanahan and Weinberd . [201 lb . 



However, the most common scenario is one where the oncogene is 
not directly druggable. Thus, in these cases it may be possible to 
use differential flux entropy to identify (neighboring) viable drug 
targets that also exhibit significant reductions in flux entropy. This 
novel computational strategy could therefore guide non-oncogene 
addiction based therapeutic strategie s that aim to selec t drug targets 
within the same oncogenic pathway dLuo et fl/.ll2009al lbh. 



CONCLUSIONS 

In summary, in this work we have adapted a known graph theoretical 
framework for studying dynamics on networks to elucidate a 
system-omic hallmark of cancer. Specifically, we have shown 
that the cancer cell phenotype is characterised by an increase in 
information flux entropy, and that this increase is intricately linked 
to the network and the dynamics defined on it. Further investigation 
of the statistical mechanical principles characterising cancer gene 
networks is warranted as it may help rationalize the choice of drug 
targets. 
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SUPPLEMENTARY FIGURE LEGENDS 

Fig.Sl 

Comparison of local average Pearson correlation coefficients (PCC) 
with non-local average PCC. Shown are the densities (y-axis) of 
the correlation values (x-axis). PCC were computed over normal 
samples only, for six different tissues as indicated. In the local case, 
for each node an average over the nearest neighbors in the PIN is 
computed. In the non-local case (green), for each node the average 
is computed over a random selection of other nodes in the PIN, and 
shown are the densities for 10 different randomisations. P-values 
are from a paired Wilcoxon-rank sum tests testing the difference 
between the local and non-local distributions. 
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